Comparison of two optimization methodsto derive energy parameters for protein folding : perceptron
نویسندگان
چکیده
Two methods were proposed recently to derive energy parameters from known native protein conformations and corresponding sets of decoys. One is based on nding, by means of a perceptron learning scheme, energy parameters such that the native conformations have lower energies than the decoys. The second method maximizes the di erence between the native energy and the average energy of the decoys, measured in terms of the width of the decoys' energy distribution (Z-score). Whereas the perceptron method is sensitive mainly to \outlier" (i.e. extremal) decoys, the Z-score optimization is governed by the high density regions in decoy-space. We compare the two methods by deriving contact energies for two very di erent sets of decoys; the rst obtained for model lattice proteins and the second by threading. We nd that the potentials derived by the two methods are of similar quality and fairly closely related. This nding indicates that standard, naturally occuring sets of decoys are distributed in a way that yields robust energy parameters (that are quite insensitive to the particular method used to derive them). The main practical implication of this nding is that it is not necessary to ne-tune the potential search method to the particular set of decoys used. 2 INTRODUCTION To perform protein folding one assigns an energy E to a protein sequence in a given conformation. One of the simplest approximation to the true energy is the pairwise contact approximation Etrue(a;S) ' Epair(a;S;w) = N Xi 1 indicates that the native conformation is below in energy than a vast majority of decoys. However, even in this case some decoys can have the energy below E0. 6 The perceptron method VD used a perceptron learning technique to nd energy parameters w for which the set of inequalities (2). The perceptron learning technique they used either converges to a solution w of the inequalities (2), or provides a proof for non-existence of such a solution. For any conformation the condition Eq.(2) can be expressed as w x > 0 (8) To see this just note that for any map S the energy (1) is a linear function of the 210 contact energies that can appear and it can be written as Epair(a;S ;w) = 210 X c=1Nc(S )wc (9) Here the index c = 1; 2; :::210 labels the di erent contacts that can appear and Nc(S ) is the total number of contacts of type c that actually appear in map S . The di erence between the energy of this map and the native SN is, therefore, E = 210 Xc=1x cwc = w x (10) where we used the notation x c = Nc(S ) Nc(S0) (11) and S0 is the native map. Each candidate map S is represented by a vector x and hence the question raised above regarding stabilization of a sequence a becomes Can one nd a vector w such that condition (8) holds for all x ? If such a w exists, it can be found by perceptron learning. A perceptron is the simplest neural network [10]. It is aimed to solve the following task. Given P patterns (also called input vectors, examples) x , nd a vector w of weights, such that the condition 7 h = w x > 0 (12) is satis ed for every example from a training set of P patterns, x , = 1; : : : ; P . If such a w exists for the training set, the problem is learnable; if not, it is unlearnable. We assume that the vector of \weights" w is normalized, w w = 1 (13) The vector w is \learned" in the course of a training session. The P patterns are presented cyclically; after presentation of pattern the weights w are updated according to the following learning rule: w0 = >>>>><>>>>>: w+ x jw+ x j if w x < 0 w otherwise (14) This procedure is called learning since when the present w misses the correct \answer" h > 0 for example , all weights are modi ed in a manner that reduces the error. No matter what initial guess for the w one takes, a convergence theorem guarantees that if a solution w exists, it will be found in a nite number of training steps [10,11]. For learnable problems there is a continuous set of solutions, among which one can nd the optimal one, the perceptron of maximal stability [12,13,6]. This solution maximizes the smallest gap between the native energy and the respective rst excited state of the M proteins in the learning set. In the algorithm the condition (12) is replaced by h = w x > c (15) where c is a positive number that should be made as large as possible. At each time step the \worst" example x is identi ed, namely the one such that h = w x = min w x (16) Such an example is used to update the weights according again to the rule (14). The eld h (t) keeps changing at each time step t, the procedure is iterated until it levels o to its asymptote. 8 COMPARISON OF THE METHODS USING LATTICE PROTEINS Lattice proteins constitute a simpli ed paradigm that represents many aspects of the real problem quite faithfully. Because of their relative simplicity, they were used to test a wide variety of ideas on proteins, ranging from sequence design, folding dynamics and calculation of free-energy landscape and many more. They form a well controlled theoretical construct about which many basic questions can be asked, without the need to involve the added complexity of real polypeptide chains. In particular, short fully compact lattice proteins were used by MS to test the Z-score methodology; hence it is natural to use the same set as a testing ground for the perceptron method and for comparing it to the results obtained by Z-score. The database of M = 200 proteins used by MS was set up as follows. They randomly chose 200 conformations on a 3 3 3 cube. Using the potential of Miyazawa and Jernigan (MJ) [14] (hereafter referred to as the \true" potential) they designed for each conformation a sequence which minimized the Z-score as a function of the sequence composition. The design method is standard Monte-Carlo optimization in sequence space [15,16]. A version of the method that optimizes directly the Z-score, without the requirement of constant amino acid composition was used. In the second part of their study they used the 200 sequences and structures in the database to optimize the Z-score as a function of w. In this way they found a solution wZL. The 200 structures used in this study were the same as in earlier work by Mirny and Shakhnovich [8] and the procedure of parameter derivation and its relation to \true" input parameters are described in detail [8]. We use here 198 of the 200 MS conformations and the corresponding optimal sequences. For each of the 198 proteins, all the 103346 conformations on the cube were considered as decoys, yielding P = 198 103345 20 106 inequalities. This is a very large number of examples to learn; fortunately, as we shall see, the structure of the problem facilitates our task considerably. Only few of the 20 106 examples x are relevant to the learning procedure. 9 As our rst attempt to derive energy parameters, we used the standard perceptron learning rule [11]. The procedure is the following. We initialized the vector w of parameters by drawing 210 random numbers uniformly distributed in the interval [-1,1]. In this way, at the start P=2 examples are on average violating the P inequalities Eq. (12). Then we ran cyclically through the P inequalities, updating the vector w each time a violation of an inequality was found. We derived three di erent solutions w1, w2, and w3, each one obtained by starting from a di erent point in parameter space. Given the low complexity of this particular learning problem the solutions w1 and w3 were found after only 1 sweep through the P examples during which 8 updates of w were performed and the solution w2 was found after two sweeps, which involved 11 and 1 updates respectively. We observe that in this context \low complexity" has the speci c meaning that to change the sign of P=2 107 inequalities only about 10 updates are typically necessary. We found that the correlation coe cients between the solutions ; = w w (17) were quite small, respectively 1;2 = 0:65, 1;3 = 0:60, and 2;3 = 0:57. This information is important since it measures the size of version space, i.e. that part of the parameter space whose points are solutions of Eqs. (2)). A random initial guess for w lies outside version space and it \di uses" towards it during the learning process. As soon as w enters version space the learning process de ned above stops. Hence our three solutions, that were generated starting from three uncorrelated random initial guesses, represent three typical vectors close to the boundary of version space; the angle between a pair of such vectors is about 53 . As our second learning attempt we found the perceptron wPL of maximal stability. This solution is near the centre of version space. From the previous attempt, we understood that only very few examples are relevant for the learning process. Giving such insight, we followed a more economic procedure than the previous one which required to sweep each time through all the P examples, the vast majority of which did not contribute to the learning. 10 For each sequence we generated 100 \important" low-energy examples. One way to do this is, as before, to start from an initial random choice for the parameters w and to sweep once through the P examples, updating w. By using the updated w for each sequence we identi ed the 100 examples of lowest energy. Then a second random set of parameters w was drawn and, again, the 100 examples of lowest energy were identi ed in the same way. Typically a few tens of structures are common to these two sets of 100. By taking 100 low energy structures determined by either of the potentials we are including the lowest 10 or so structures of any other reasonable pairwise potential function. In this way we reduced the size of the learning task to ND = 19800 examples. Once these \hard" exampled were learned, we turned back to the full set of 20 106 examples to ascertain that the solution obtained indeed satis es the entire set of inequalities. In Fig. 1 we demonstrate that only less than P hard 100 examples participated in the learning process. The gure shows the number of updates for each example that were necessary to converge to the optimal solution, sorted in decreasing order. In practice, around one half of the sequences did not contribute at all to the total P hard and the remaining ones contributed one or very few examples. We found that the overlaps of wPL with the three non-optimal solutions are P;1 = 0:74, P;2 = 0:71, and P;3 = 0:66, corresponding to a smaller angle (about 45 ). For wPL the minimal gap between a native map and the lowest decoy above it is min w x = cPL = 0:45 Next we investigated the in uence of the database size on the derived potential. To this e ect, we obtained new solutions wPM using only a subset of M proteins in the database. For example, forM = 99 proteins the correlation with the full solutionwPL is PL;PM = 0:89 and the stability cPM = 0:54. The set wPM is still a solution of the whole database of 198 proteins. However, the stability in the whole database is reduced to c198 PM = 0:035, as shown in Fig. 2. The stability as a function of M apparently asymptotes to a non-zero value. This fact might be regarded as a measure of the degree of design to which the proteins in the database have been subjected. There are two questions that can be asked to compare the two methods of extraction of 11 potentials Which is the best method to recover the true potential knowing only the sequences and their ground states? A possible answer is given by the correlation coe cient between the true potential and the derived one. The Z-score method gave = 0:84 and the perceptron method = 0:69. The correlation between wPL and wZL is = 0:79. Which is the method that gives a \better" potential ? We considered four di erent measures of performance to answer to this question. 1. The Z-score measures the gap between the ground state and the average energy of a given sequence on a set of decoys. The perceptron method measures the gap between the ground state and the rst excited state. Thus, the previous question can be rephrased as How does hZiharm of wP compare with hZiharm of wZL ? In the case of wPL, we obtained hZiharm = 6:44, and for wZL we obtained hZiharm = 6:93 2. Another way to formulate the same question is How does the stability cZL of wZL compare with the stability cPL of wPL ? We found cZL = 0:05 and cPL = 0:45. 3. Beyond the Z-score and the gap to the rst excited state, a third way to quantify the stability is to look at the correlation between the overlap Q and the di erence in energy with the ground state E. The overlap Q is de ned as Q = Np Nc (18) 12 where Np is the number of contacts present both in the native contact map and in the contact map of the decoy and Nc is the number of contacts in the native contact map (contacts along the three main diagonals are not counted). A good potential should provide low energy to conformations close to the native state and high energy to those very di erent from it. As shown in Fig. 3, wZL and wPL provide approximately the same correlation. The Z-score method reduces the width of the distribution of the energy of the decoys, as also shown in Fig. 4. The perceptron on the other hand, for large Q, pushes up the bottom of the energies, enlarging the gap to the ground state. 4. A fourth way to assess which is the \quality" of the recovered potential is to check whether the ground state obtained using it is indeed the correct ground state. For each of the 198 designed sequences the corresponding compact structures were the ground states of the \true" MJ potential. The energy parameters obtained by the perceptron method depend on the particular set of decoys that were used. In the lattice case discussed above, we have used only maximally compact decoys. Performing a Monte Carlo energy minimization on the entire space of conformations, using the derived energy parameters wPL, we found that for 6 of the 198 sequences there were non maximally compact conformations, whose energy was lower than the \true" ground state. Using the Z-score derived energy parameters, the same test gave 8 mistakes. Finally, we observe that the database was obtained by minimizing the Z-score in the space of sequences at xed conformation. The recovery of the parameter set was carried out by minimizing again the Z-score in the space of parameters. This procedure can introduce a bias, which complicates the comparison with the perceptron method to derive the energy parameter set. THREADING We present here the results of an experiment of gapless threading, using the two methods. We considered a test set of 100 PDB proteins, for which decoys were derived by threading 13 each sequence of every protein through the structure of all the longer ones. We used two sets of energy parameters; one, wPT , obtained by perceptron learning and the second, wZT , obtained by Z-score optimization. The set wPT was obtained by learning the solution of maximal stability for an independent set of 123 proteins and 836020 decoys [13]. The set wZT was obtained in MS. For both potentials an all atoms de nition of contacts was used with a threshold Rc=4.5 A. The correlation between wPT and wZT is = 0:61. Testing the two methods on 100 proteins (that do not appear in the set that was used to derive the contact energies) gave results of similar quality (see Table I). The perceptron solution misclassi ed less decoys; the Z-score solution, on the other hand, assigned larger Z-score to the native states, as expected. The distribution of the overlap Q and of the energy E are given for the two methods and for all the 100 proteins in Fig. 5. For both methods we considered the correlation between Q and E (see Fig. 6). In these gures we compare two cases. The rst is the protein 1mol, which was classi ed correctly against 11191 decoys by both methods; it has 94 residues. The second case is the protein 1isuin, of 62 residues, which both methods failed to classify correctly. Of the 14016 decoys that were produced, the perceptron assigned lower than native energy to 138 decoys and the Z-score to 208. For the vast majority of the studied proteins (95%) we found a reasonably good correlation between Q and E (see in Figs. 6(a,b)) in that the single map with high Q (e.g. the native one) has lower energy than the low-Q decoys. We must add, however, the following note of caution. It is possible that this observed correlation is present only because gapless threading fails to generate challenging and high-Q decoys. Within the present calculations we are allowed to use only the pairwise contact approximation to some much more complicated \true" contact-map potential. One cannot rule out the possibility that this is such a poor approximation to the true potential, that had we generated better high-Q decoys, the observed correlation would have disappeared. The case of lattice proteins can not guide us to resolve this question. For lattice proteins the \true potential" that produced the native folds was a pairwise contact potential, whereas the native structures of the threading experiment were stabilized by the (presumably much more complicated) 14 \true" potential that governs protein folding under physiological conditions. We also looked for some relationship between Q and E of the low-energy decoys. For each protein we measured the quantity = min k arctan Ek=jE0 kj 1 Qk (19) where k runs over all the decoys generated for such particular protein and E0 k is the energy of the native state of proteins k. For the cases shown in Fig. 6 these minimal values are reached at the decoy of minimal energy. The distribution of the angles is shown in Fig. 7. For both methods we get fairly similar distributions, with a large peak on the positive side corresponding to successful classi cations and a broad tail on the negative side corresponding to missclassi ed folds. Figure 8 shows potentials obtained by the two methods for real proteins. There several important features shared by both potentials. (1) Cystein, hydrophobic residues and aromatic residues except proline (C,M,F,I,L,V,W,Y,H) attract each other. (2) Most of polar residues are repelling from each other and from the hydrophobic ones. (3) Interactions between charged residues (D,E,K,R) are much weaker than between hydrophobic ones. Although most of the charged interactions have the right sign they are hardly noticeable among other interactions of polar residues. These properties are easy to understand since hydrophobic/aromatic residues tend to cluster in the protein core while polar ones are spread on the protein surface. Hence contacts between hydrophobic/aromatic residues are found much more frequently than between polar ones. Cysteins frequently form stabilizing disulphide bonds and are usually located in the protein core. It is also clear why both potentials have weak electrostatic interactions. Salt bridges formed by pairs of oppositely charged residues although contributing to stability of some proteins [19] are known to be rear and less important than hydrophobic interactions [20]. Focusing on di erences between the two potentials we notice that (1) wZT (Fig 8 B) has all interaction energies distributed much more evenly among residues. In contrast wPT has 15 very diverse interactions, especially those between polar residues. We suggest two possible explanation for smoothness of interactions inwZT vs diversity inwPT . First, it is known that potentials obtained by optimization of Z-scores (or similar functions) tend to underestimate repulsive interactions [8,21] and hence have smoother polar-polar interactions. Secondly, wPT was obtained by discriminating the native fold from explicit decoys. The learning procedure focused on a few low energy decoys must have learned certain features speci c for these decoys and hence produced diverse pattern of polar-polar interactions. In summary, interactions between amino acids provided by both potentials agree well with physical and chemical properties of these amino acids and with known features of native proteins. DISCUSSION In this paper we presented a detailed comparison of two methods to derive energy parameters from known protein structure the Z-score optimization [8] and the perceptron learning [1]. First we chose an exactly solvable model lattice 27-mers where sequences were designed to fold to their respective \native" conformations with certain \true" potentials. Our analysis showed that both methods recovered the potentials that were su ciently close to \true" ones. Besides that, the maximum stability perceptron was able to nd the potential with largest energy gap between the ground state and \ rst" excited state while the potentials derived using the Z-score optimization provided a slightly lower Z-scores for the native structures of lattice proteins. This is as expected. It can be seen in Fig. 3 that the large gap provided by strongest perceptron is between the native state and the \ rst excited" that is structurally very similar to the native, having high overlap with the native state at Q 0:8. Which method may be better suited for practical applications? Each one has its strengths and weaknesses. The major strengths of the Z-score method are the possibility to use implicit decoys and and relative computational simplicity. The weakness is that it does not guarantee that the native state is lowest in energy with derived parameters, i.e. there are 16 no \outliers" that feature lower energy than the native state. The strengths and weaknesses of the perceptron method are complementary to that of the Z-score method. The computational e ciency is at issue here especially since the perceptron method requires explicit decoys whose number can be great. In this regard the observation that in practice the perceptron method used only a tiny fraction of all 103346 lattice conformations is remarkable and telling. It certainly requires a deeper analysis that will be presented elsewhere. Energy parameters for lattice proteins, for which the contact potential is the true potential, can be derived by both methods with no remarkable di erences. Further, we tested both methods on a gapless threading applications. Two potentials were used. One, generic, was derived earlier by MS to minimize the Z-score of native proteins against implicit decoys [8]. VD derived the other potential used here by perceptron learning of 836020 decoys, obtained by gapless threading for 123 proteins, (using the all atoms de nition of contact and a threshold of 4.5 A) [13]. The 100 proteins that were used in our calculations reported here, to test the performance of both potentials, were included neither in the training set for perceptron learning, nor in the set of the Z-scorebased derivation in [8]. Both potentials performed well in gapless threading tests providing recognition of the native state in roughly 95% of all presented proteins. Importantly, most of the proteins whose native states were not recognized by either of the methods were "special" in the sense that they are stabilized by certain "extraneous" factors such as metal ions, quaternary interactions, etc. (see also Ref. [7]). In this paper we provided the analysis of two methods of derivation of potentials for protein structure predictions using most rigorous tests on lattice proteins and gapless threading. Both methods performed approximately with equal e ciency alleviating the major concerns that Z-score may not be able to provide potentials that discriminate against a few special lowest energy decoys and that perceptron method may fail to deliver low Z-scores to native structures. Such cross-validation is important for application of either of the potential derivation methods to real protein structure prediction problems. Which method is preferable? The answer depends on speci c application. When explicit decoys are problematic to 17 obtain the Z-score method, that does not require them, can be used. On the other hand, in cases when explicit decoys are available the perceptron learning may provide a reliable set of potentials provided that the problem is \learnable" [1,17]. Our study of the gapless threading application indicates that the learnability of the problem, for the perceptron, may depend on the inclusion of a small number of "outliers" in the training set, i.e. proteins that are stabilized by extraneous factors such as quaternary interactions or large number of disul des. This is consistent with the situation in the Z-score optimization methods where addition of such proteins into the training set also rendered the problem unsolvable in a sense that no convergence to any potential was obtained. These ndings teach us an important lesson, that the choice of the training set is crucial so that proteins in the training set should be stabilized by the same physical factors as those proteins whose structure is being determined using the derived potentials. Since such physical factors are not known a priori for a new protein, high Z-score, or low maximal stability with the perceptron-derived potentials may be an indication that such a situation is encountered. We used gapless threading to test the methods of potential derivation. The advantage of this approach is in its extreme simplicity. However we should note that gapless threading is not a very practical tool for real-life structure prediction application because actual native structure of the query sequence is never in the set of conformation scanned by threading simulation. An actual threading calculation aims to select analogs of the native state of the query sequence in the ensemble of structures scanned. Gapless threading is generally not capable to select or recognize analogs (Mirny and Shakhnovich, unpublished data). To this end a more advanced threading technique should be used that allows gaps and insertions in sequence and structure [18]. This comes, however, at a price of increasing the number of decoys. The need to discriminate against larger number of decoys requires better discriminating potentials and/or more detailed models of proteins. A combination of perceptron learning for discriminating against most di cult lowest energy decoys with the Z-score optimization to discriminate against a mass of \average" decoys, may be a way to address these most challenging problems of protein structure prediction. 18 AcknowledgmentsThis work was supported by grants from the US-Israel Binational Science Foundation(BSF), the Germany Israel Science Foundation (GIF), the Minerva Foundation, the Euro-pean Molecular Biology Organization (EMBO) and the National Institute of Health (NIH);Grant GM52126.19 REFERENCES[1] Vendruscolo M, Domany E. Pairwise contact potentials are unsuitable for protein foldingJ Chem Phys 1998; 109: 11101-11108.[2] Goldstein R, Luthey-Schulten ZA, Wolynes PG. Optimal protein-folding codes fromspin-glass theory. Proc Natl Acad Sci USA 1992; 89: 4918-4922.[3] Goldstein R, Luthey-Schulten ZA, Wolynes PG. Protein tertiary structure recognitionusing optimized Hamiltonians with local interactions. Proc Natl Acad Sci USA 1992;89: 9029-9033[4] Maiorov V, Crippen G. Contact potential that recognizes the correct folding of globularproteins. J Mol Biol 1992; 227: 876-888.[5] Hao MH, Scheraga HA. How optimization of potential function a ects protein folding.Proc Natl Acad Sci USA 1996; 93: 4984-4989.[6] Settanni G, Micheletti C, Banavar JR, Maritan A. Determination of optimal e ectiveinteractions between amino acids in globular proteins. http://xxx.lanl.gov/abs/cond-mat/9902364 1999.[7] Bastolla U, Vendruscolo M, Knapp, EW. Proc Natl Acad Sci USA 2000; in press.[8] Mirny LA, Shakhnovich EI. How to derive a protein folding potential? A new approachto an old problem. J Mol Biol 1996; 264: 1164-1179.[9] Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AN, Teller E. Equation of statecalculations by fast computing machines. J Chem Phys 1953; 21: 1087-1092.[10] Rosenblatt F. Principles of neurodynamics. Spartan books, New York 1962.[11] Minsky ML, Papert SA. Perceptrons. MIT press, Cambridge MA 1969.[12] Krauth W, Mezard M. Learning algorithms with optimal stability in neural networks.J Phys A 1987; 20: L745-L752.20 [13] Vendruscolo M, Najmanovich R, Domany E. Can a pairwise contact potential stabilizenative protein folds against decoys obtained by threading ? Proteins 2000; 38: 134-148.[14] Miyazawa S, Jernigan, RL. Estimation of e ective inter-residue contact energies fromprotein crystal structures: quasi-chemical approximation. Macromolecules 1985; 18:534-552.[15] Shakhnovich EI, Gutin A. Engineering of stable and fast-folding sequences of modelproteins. Proc Natl Acad Sci USA 1993; 90: 7195-7199.[16] Shakhnovich EI, Gutin A. A novel approach to design of stable proteins. Prot Eng 1993;6: 793-800.[17] Vendruscolo M, Najmanovich R, Domany E. Protein folding in contact map space, PhysRev Lett 1999; 82: 656-659.[18] Mirny LA, Shakhnovich EI. Protein structure prediction by threading. When it worksand when it does not. J Mol Biol 1998; 264: 1164-1179.[19] Kumar S, Nussinov R. Salt bridge stability in monomeric proteins. J Mol Biol 1999;293: 1241-1255.[20] Xu D, Lin SL, Nussinov R. Protein binding versus protein folding: the role of hydrophilicbridges. J Mol Biol 1997; 265: 69-94.[21] Zhang L, Skolnick J. How do potentials derived from structural databases relate to"true" potentials? Protein Sci 1998; 7: 112-12221 Table captionsTable 1Results of the gapless threading fold recognition experiment. We used 100 proteins and698,898 decoys.22 TABLESPotential Misclassi ed Proteins Misclassi ed Decoys Z-scoreVD5192-7.20MS71261-8.44TABLE I.Vendruscolo et al. Table 123 Figure captionsFigure 1Number of updates that were necessary for each example to converge to the solution ofmaximal stability. Only less than one hundred examples participated to the learning process.Figure 2Stability cPM as a function of the number M of proteins in the database (full squares). Thelower curve is the stability c198PM in complete set of 198 proteins. We also show the stabilityfor a single protein averaged over all the M proteins in the set (curve with error-bars).Figure 3Contour plot of the energy di erence E between decoys and native state and the overlap Qwith the native state, (a) Using the set of true energy parameters; (b) Using the set wZL ofenergy parameters; (c) Using the set wPL of energy parameters. Contour levels are spacedlogarithmically.Figure 4Histogram of the energy di erences between the decoys and the native states, for the trueenergy parameters and as obtained by the two methods.Figure 5(a) Histogram of the energy di erences between the decoys and the native states, as obtainedby the two methods for all 100 proteins tested. (b) Histogram of the overlap Q with thenative state for the set of decoys used in the threading test.24 Figure 6Scatter plot of the energy di erence E between decoys and native state and the overlapQ with the native state, (a) Using the set wZT of energy parameters for protein 1mol,correctly classi ed by the Z-score method. (b) Using the set wPT of energy parameters forprotein 1mol, correctly classi ed by the perceptron method. (c) Using the set wZT of energyparameters for protein 1isu, incorrectly classi ed by the Z-score method. (d) Using the setwPT of energy parameters for protein 1isu, incorrectly classi ed by the perceptron method.Figure 7Histogram of the angle . The perceptron and the Z-score provide similar distributions.Figure 8Comparison of the two sets of pairwise contact energy parameters wPT (A) and wZT (B).25
منابع مشابه
Comparison of two optimization methods to derive energy parameters for protein folding: perceptron and Z score.
Two methods were proposed recently to derive energy parameters from known native protein conformations and corresponding sets of decoys. One is based on finding, by means of a perceptron learning scheme, energy parameters such that the native conformations have lower energies than the decoys. The second method maximizes the difference between the native energy and the average energy of the deco...
متن کاملComparison of two optimization
Two methods were proposed recently to derive energy parameters from known native protein conformations and corresponding sets of decoys. One is based on nding, by means of a perceptron learning scheme, energy parameters such that the native conformations have lower energies than the decoys. The second method maximizes the di erence between the native energy and the average energy of the decoys,...
متن کاملHow to derive a protein folding potential? A new approach to the old problem
A new approach to the old problem. 1 ABSTRACT In this paper we introduce a novel method of deriving a pairwise potential for protein folding. The potential is obtained by optimization procedure, which simultaneously maximizes the energy gap for all proteins in the database. To test our method and compare it with other knowledge-based approaches to derive potentials, we use simple lattice model....
متن کاملElusive Unfoldability: Learning a Contact Potential to Fold Crambin
We investigate the extent to which the commonly used standard pairwise contact potential can be used to identify the native fold of a protein. Ideally one would hope that a universal energy function exists, for which the native folds of all proteins are the respective ground states. Here we pose a much more restricted question: is it possible to find a set of contact parameters for which the en...
متن کاملP-31: The Alteration of SpermatogenesisHas A Correlation with Sertoli Cell Mitochondrial Abnormal Morphology in Cytotoxicity of Testicular Tissue Mediatedwith Monosodium
Background: Male infertility has many causes, including genetic infertility. The NOP2/Sun domain family, member7 (Nsun7) gene, which encodes putative methyltransferase Nsun7, has a role in sperm motility. The aim of the present study was to investigate the effect of the T26248G polymorphism on Nsun7 protein function and its role in male infertility. Materials and Methods: Semen samples were col...
متن کامل